84 research outputs found
Matrix eQTL: Ultra fast eQTL analysis via large matrix operations
Expression quantitative trait loci (eQTL) mapping aims to determine genomic
regions that regulate gene transcription. Expression QTL is used to study the
regulatory structure of normal tissues and to search for genetic factors in
complex diseases such as cancer, diabetes, and cystic fibrosis. A modern eQTL
dataset contains millions of SNPs and thousands of transcripts measured for
hundreds of samples. This makes the analysis computationally complex as it
involves independent testing for association for every transcript-SNP pair. The
heavy computational burden makes eQTL analysis less popular, often forces
analysts to restrict their attention to just a subset of transcripts and SNPs.
As larger genotype and gene expression datasets become available, the demand
for fast tools for eQTL analysis increases. We present a new method for fast
eQTL analysis via linear models, called Matrix eQTL. Matrix eQTL can model and
test for association using both linear regression and ANOVA models. The models
can include covariates to account for such factors as population structure,
gender, and clinical variables. It also supports testing of heteroscedastic
models and models with correlated errors. In our experiment on large datasets
Matrix eQTL was thousands of times faster than the existing popular software
for QTL/eQTL analysis. Matrix eQTL is implemented as both Matlab and R packages
and thus can easily be run on Windows, Mac OS, and Linux systems. The software
is freely available at the following address:
http://www.bios.unc.edu/research/genomic_software/Matrix_eQTLComment: 9 pages, 1 figur
An Empirical Bayes Approach for Multiple Tissue eQTL Analysis
Expression quantitative trait loci (eQTL) analyses, which identify genetic
markers associated with the expression of a gene, are an important tool in the
understanding of diseases in human and other populations. While most eQTL
studies to date consider the connection between genetic variation and
expression in a single tissue, complex, multi-tissue data sets are now being
generated by the GTEx initiative. These data sets have the potential to improve
the findings of single tissue analyses by borrowing strength across tissues,
and the potential to elucidate the genotypic basis of differences between
tissues.
In this paper we introduce and study a multivariate hierarchical Bayesian
model (MT-eQTL) for multi-tissue eQTL analysis. MT-eQTL directly models the
vector of correlations between expression and genotype across tissues. It
explicitly captures patterns of variation in the presence or absence of eQTLs,
as well as the heterogeneity of effect sizes across tissues. Moreover, the
model is applicable to complex designs in which the set of donors can (i) vary
from tissue to tissue, and (ii) exhibit incomplete overlap between tissues. The
MT-eQTL model is marginally consistent, in the sense that the model for a
subset of tissues can be obtained from the full model via marginalization.
Fitting of the MT-eQTL model is carried out via empirical Bayes, using an
approximate EM algorithm. Inferences concerning eQTL detection and the
configuration of eQTLs across tissues are derived from adaptive thresholding of
local false discovery rates, and maximum a-posteriori estimation, respectively.
We investigate the MT-eQTL model through a simulation study, and rigorously
establish the FDR control of the local FDR testing procedure under mild
assumptions appropriate for dependent data.Comment: accepted by Biostatistic
Reconstruction of a low-rank matrix in the presence of Gaussian noise
This paper addresses the problem of reconstructing a low-rank signal matrix observed with additive Gaussian noise. We first establish that, under mild assumptions, one can restrict attention to orthogonally equivariant reconstruction methods, which act only on the singular values of the observed matrix and do not affect its singular vectors. Using recent results in random matrix theory, we then propose a new reconstruction method that aims to reverse the effect of the noise on the singular value decomposition of the signal matrix. In conjunction with the proposed reconstruction method we also introduce a KolmogorovāSmirnov based estimator of the noise variance
Computational tools for discovery and interpretation of expression quantitative trait loci
Expression quantitative trait locus (eQTL) analysis is rapidly moving from a cutting-edge concept in genomics to a mature area of investigation, with important connections to genome-wide association studies for human disease, pharmacogenomics and toxicogenomics. Despite the importance of the topic, many investigators must develop their own code or use tools not specifically suited for eQTL analysis. Convenient computational tools are becoming available, but they are not widely publicized, and investigators who are interested in discovery or eQTL, or in using them to interpret genome-wide association study results may have difficulty navigating the available resources. The purpose of this review is to help investigators find appropriate programs for eQTL analysis and interpretation
Finding large average submatrices in high dimensional data
The search for sample-variable associations is an important problem in the
exploratory analysis of high dimensional data. Biclustering methods search for
sample-variable associations in the form of distinguished submatrices of the
data matrix. (The rows and columns of a submatrix need not be contiguous.) In
this paper we propose and evaluate a statistically motivated biclustering
procedure (LAS) that finds large average submatrices within a given real-valued
data matrix. The procedure operates in an iterative-residual fashion, and is
driven by a Bonferroni-based significance score that effectively trades off
between submatrix size and average value. We examine the performance and
potential utility of LAS, and compare it with a number of existing methods,
through an extensive three-part validation study using two gene expression
datasets. The validation study examines quantitative properties of biclusters,
biological and clinical assessments using auxiliary information, and
classification of disease subtypes using bicluster membership. In addition, we
carry out a simulation study to assess the effectiveness and noise sensitivity
of the LAS search procedure. These results suggest that LAS is an effective
exploratory tool for the discovery of biologically relevant structures in high
dimensional data. Software is available at https://genome.unc.edu/las/.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS239 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Genome-wide association study meta-analysis of suicide death and suicidal behavior
Suicide is a worldwide health crisis. We aimed to identify genetic risk variants associated with suicide death and suicidal behavior. Meta-analysis for suicide death was performed using 3765 cases from Utah and matching 6572 controls of European ancestry. Meta-analysis for suicidal behavior using data across five cohorts (n = 8315 cases and 256,478 psychiatric or populational controls of European ancestry) was also performed. One locus in neuroligin 1 (NLGN1) passing the genome-wide significance threshold for suicide death was identified (top SNP rs73182688, with p = 5.48 x 10(-8) before and p = 4.55 x 10(-8) after mtCOJO analysis conditioning on MDD to remove genetic effects on suicide mediated by MDD). Conditioning on suicidal attempts did not significantly change the association strength (p = 6.02 x 10(-8)), suggesting suicide death specificity. NLGN1 encodes a member of a family of neuronal cell surface proteins. Members of this family act as splice site-specific ligands for beta-neurexins and may be involved in synaptogenesis. The NRXN-NLGN pathway was previously implicated in suicide, autism, and schizophrenia. We additionally identified ROBO2 and ZNF28 associations with suicidal behavior in the meta-analysis across five cohorts in gene-based association analysis using MAGMA. Lastly, we replicated two loci including variants near SOX5 and LOC101928519 associated with suicidal attempts identified in the ISGC and MVP meta-analysis using the independent FinnGen samples. Suicide death and suicidal behavior showed positive genetic correlations with depression, schizophrenia, pain, and suicidal attempt, and negative genetic correlation with educational attainment. These correlations remained significant after conditioning on depression, suggesting pleiotropic effects among these traits. Bidirectional generalized summary-data-based Mendelian randomization analysis suggests that genetic risk for the suicidal attempt and suicide death are both bi-directionally causal for MDD.Peer reviewe
seeQTL: a searchable database for human eQTLs
Summary: seeQTL is a comprehensive and versatile eQTL database, including various eQTL studies and a meta-analysis of HapMap eQTL information. The database presents eQTL association results in a convenient browser, using both segmented local-association plots and genome-wide Manhattan plots
FastMap: Fast eQTL mapping in homozygous populations
Motivation: Gene expression Quantitative Trait Locus (eQTL) mapping measures the association between transcript expression and genotype in order to find genomic locations likely to regulate transcript expression. The availability of both gene expression and high-density genotype data has improved our ability to perform eQTL mapping in inbred mouse and other homozygous populations. However, existing eQTL mapping software does not scale well when the number of transcripts and markers are on the order of 105 and 105ā106, respectively
Refinement of schizophrenia GWAS loci using methylome-wide association data
Recent genome-wide association studies (GWAS) have made substantial progress in identifying disease loci. The next logical step is to design functional experiments to identify disease mechanisms. This step, however, is often hampered by the large size of loci identified in GWAS that is caused by linkage disequilibrium (LD) between SNPs. In this study, we demonstrate how integrating methylome-wide association study (MWAS) results with GWAS findings can narrow down the location for a subset of the putative casual sites. We use the disease schizophrenia as an example. To handle ādata analyticā variation we first combined our MWAS results with two GWAS meta-analyses (N=32,143 and 21,953), that had largely overlapping samples but different data analysis pipelines, separately. Permutation tests showed significant overlapping association signals between GWAS and MWAS findings. This significant overlap justified prioritizing loci based on the concordance principle. To further ensure that the methylation signal was not driven by chance, we successfully replicated the top three methylation findings near genes SDCCAG8, CREB1 and ATXN7 in an independent sample using targeted pyrosequencing. In contrast to the SNPs in the selected region, the methylation sites were largely uncorrelated explaining why the methylation signals implicated much smaller regions (median size 78bp). The refined loci showed considerable enrichment of genomic elements of possible functional importance and suggested specific hypotheses about schizophrenia etiology. Several hypotheses involved possible variation in transcription factor binding efficiencies
Basal-like Breast cancer DNA copy number losses identify genes involved in genomic instability, response to therapy, and patient survival
Breast cancer is a heterogeneous disease with known expression-defined tumor subtypes. DNA copy number studies have suggested that tumors within gene expression subtypes share similar DNA Copy number aberrations (CNA) and that CNA can be used to further sub-divide expression classes. To gain further insights into the etiologies of the intrinsic subtypes, we classified tumors according to gene expression subtype and next identified subtype-associated CNA using a novel method called SWITCHdna, using a training set of 180 tumors and a validation set of 359 tumors. Fisherās exact tests, Chi-square approximations, and Wilcoxon rank-sum tests were performed to evaluate differences in CNA by subtype. To assess the functional significance of loss of a specific chromosomal region, individual genes were knocked down by shRNA and drug sensitivity, and DNA repair foci assays performed. Most tumor subtypes exhibited specific CNA. The Basal-like subtype was the most distinct with common losses of the regions containing RB1, BRCA1, INPP4B, and the greatest overall genomic instability. One Basal-like subtype-associated CNA was loss of 5q11ā35, which contains at least three genes important for BRCA1-dependent DNA repair (RAD17, RAD50, and RAP80); these genes were predominantly lost as a pair, or all three simultaneously. Loss of two or three of these genes was associated with significantly increased genomic instability and poor patient survival. RNAi knockdown of RAD17, or RAD17/RAD50, in immortalized human mammary epithelial cell lines caused increased sensitivity to a PARP inhibitor and carboplatin, and inhibited BRCA1 foci formation in response to DNA damage. These data suggest a possible genetic cause for genomic instability in Basal-like breast cancers and a biological rationale for the use of DNA repair inhibitor related therapeutics in this breast cancer subtype.Electronic supplementary materialThe online version of this article (doi:10.1007/s10549-011-1846-y) contains supplementary material, which is available to authorized users
- ā¦